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REPORT  SUMMARY 


In  Section  1 . Baxter  dlsriusses  and  demonstrates  what 
parameter  adjustments  can  accomplish  In  his  method  of 
removing  motion  blur  from  photographic  Images. 

Section  2 Indicates  the  status  of  work  by  Faugeras  in 
color  vision,  and  our  expected  method  of  reporting  these 
results . 

In  Section  3.  Boli  describes  a method  for  strikingly 
increasing  the  perjel/ed  quality  of  synthetic  speech. 
Additional  computation  at  the  receiver  is  used  to  generate 
two  channels  (l.e.  binaural)  of  sound  for  a stereo 
headphone  set.  This  method  requires  no  change  in  the 
existing  generation  and  transmission  processes  and 
algorithms . 

In  Section  4.  Petersen  demonstrates  one  method  of 
removing  noise  from  speech.  Intelligible  speech  has  been 
generated  from  input  signals  that  contain  +18DB  of  noise. 

In  Section  5.  Callahan  reviews  his  method  of 
suppressing  noise  in  a one  -dimensional  signal  stream  (e.g. 
speech)  by  using  two"  dimensional  processing.  A complete 
Technical  Report  UTEC-CSc-76-209  will  be  available 
approximately  concurrently  with  this  report. 
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Sections  6.  and  7.  report  the  start  of  work  in  the 
coding  of  speech,  and  in  the  mathematical  theory  of  human 
perception.  Both  indicate  our  future  directions  in  these 
fields . 

Section  8,  outlines  the  Image  Understanding  Research 
by  Newell  that  has  been  proposed  for  the  future. 
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SECTION  1 


REMOVING  MOTION  BLUR  KROM 
PHOTOGRAPHIC  IMAGES 
Brent  Baxter 


In  the  previous  semi-annual  report  a new  method  was 
described  for  removing  blur  from  photographic  images. 
Several  experiments  are  described  here  illustrating  the 
effect  of  adjusting  the  cutoff  frequency  of  filters  used  in 
the  restoration  scheme  . 

The  method  operates  by  high  pass  filtering  the 
logarithm  of  the  blurred  image  spectrum  and  then  adding  the 
low  pass  log  spectrum  of  a different,  unblurred  image.  This 
combination  accomplishes  two  important  results:  First, 
blurring  represents  a partial  loss  of  signal  energy  at  high 
spatial  frequencies,  and  adding  the  prototype  information 
after  filtering  tends  to  restore  energy  at  these  frequencies 
to  the  proper  level.  Second,  excessive  amplification  of 
film  grain  is  prevented.  Signal  energy  at  spatial 
frequencies  near  zeroes  of  the  blur  spectrum  cannot  be 

recovered  completely  and  any  attempt  to  do  so  will  result  in 
excessive  amplification  of  film  grain  and  other  kinds  of 
system  noise.  To  see  how  this  is  avoided,  note  that  zeroes 
in  the  blur  spectrum  become  sharp,  spike-like  negative 


Page  4 


impulses  when  the  logarithm  is  taken  snd  these  impulses  are 
preserved  in  the  high  pass  filter  operation,  This  prevents 
undesirable  amplification  at  those  spatial  frequencies 
dominated  by  noise. 

In  the  restored  Images  described  below,  the  low  pass 
and  high  pass  filters  were  constrained  to  have  frequency 
responses,  the  sum  of  which  was  a constant. 

LPF(a))  = Kw)  - HPF(to) 

This  was  done  to  avoid  problems  in  preserving  the  average 
brightness  of  the  image. 

Adjustments  to  the  cutoff  frequency  of  these  filters 
demonstrated  that  those  filters  with  a basically  circular 
shape  tended  to  allow  some  of  the  predominant  features  of 
the  prototype  to  appear  in  the  restored  image  . This  effect 
is  shv,wn  in  Figure  1.  Tailoring  the  shape  of  the  filter 
frequency  response  as  described  above  tends  to  minimize  the 
effect  as  shown  in  Figure  2.  Figure  3 is  the  image  used  to 
construct  the  prototype  spectrum.  Notice  the  strong 
diagonal  character  in  Figure  1 due  to  the  ladder  in  the 
prototype  image.  In  a practical  system  this  effect  could  be 
minimized  by  averaging  log  spectra  from  several  images  as 
well  as  by  using  the  noncircular  frequency  responses 
mentioned  above.  Using  elongated  filters  has  the  effect  of 
restoring  the  image  only  in  the  direction  of  the  blur  while 
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leaving  it  undiaturoed  in  other  directions. 

deducing  the  cutoff  frequency  of  the  filters  tends  to 
make  the  restoraton  less  noisy  and  also  somewhat  less  sharp. 
Figures  and  5 are  examples  of  restorations  obtained  using 
filter  cutoff  frequencies  one  eighth  as  high  as  those  shown 
previously.  At  present  there  is  no  systematic  way  to  select 
the  optimum  cutoff  frequency,  however,  it  may  be  varied  over 
wide  limits  with  only  a moderate  effect  on  the  restoration. 


The  image  chosen  x or  this  study  (Figure  6)  was  taken 
with  a pocket  instamatic  camera  on  16  mm  Kodacolor  film,  in 
a completely  unrehearsed  manner.  The  film  is  quite  small 
and  rather  grainy  resulting  in  a somewhat  noisy  restoration. 
Considerably  better  results  are  obtainable  if  care  is  taken 
to  record  the  blurred  image  on  a larger  size,  fine  grain 
film.  Fine  grain  development  also  helps.  Nevertheless,  the 
model  seems  to  describe  the  blur  process  well  enough  for  the 
method  to  work  quite  well.  Stripes  can  be  seen  in  the 
woman's  blouse  that  were  not  at  all  visible  in  the  original 
blurred  image;  their  existence  has  been  confirmed  by 
comparison  with  the  actual  garment. 


Correcting  the  phase  of  the  image  transform  may  be 
accomplished  in  a variety  of  ways  as  described  by  Cannon 
Li],  but  in  the  present  case  advantage  was  taken  of  a streak 
in  the  background  formed  by  the  reflection  of  a flash  bulo 
in  a spherical  globe.  The  length  and  direction  of  the 
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streak  was  used  to  compute  the  phase  of  the  blur  and  also  to 
determine  the  proper  orientation  of  the  high  pass  filter. 
The  presence  of  this  streak  made  it  possible  to  evaluate 
the  effect  of  changing  filter  cutoff  frequencies 
independently  of  the  problem  of  estimating  the  phase  and 
direction  of  the  blur. 

Periodic  convolution  was  used  for  the  filtering 
operations  rather  than  the  more  common  aperiodic  techniques 
Cole  [2]  making  it  necessary  to  preprocess  the  borders  of 
the  image.  The  edges  of  the  blurred  image  were  simply 
extended  out  from  the  borders  and  smoothly  scaled  down  to 
zero  intensity.  This  operation  is  intended  to  simulate  the 
effect  of  blurring  across  boundaries  of  adjacent  copies  of 
the  image.  It  might  be  argued  that  this  is  not  a good 
simulation  of  a blur,  however,  it  works  well  enough; 
essentially  no  edge  effects  are  noticeable  in  the  restored 
image.  See  McDonnell  and  Bates  [3]  for  adoitional 
discussion  of  this  topic. 

This  method  is  capable  of  removing  a wide  class  of 
blurs  from  ordinary  photographic  images  when  the  exact 
nature  of  the  blur  system  is  not  known.  We  are  presently 
investigating  tne  applicability  of  these  ideas  to  more 
sophisticated  blur  removal  processes  such  as  the  speckle 
interferometry  technique  of  Laberyie  [4]. 
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FIGUHtiS  FOR  SECTION  1 

Figure  1 --  Image  restored  using  filters  having  circular 
symmetry 


Figure  2 --  Image  restored  using  filters  with  elongated 
frequency  responses  oriented  perpendicular  to 
the  direction  of  the  blur 


Figure  3 --  Sharp  image  used  to  create  a prototype  log 
spectrum 


Figure  --  Image  restored  as  in  Figure  1 but  with  lower 
cutoff  frequency  filters 


Figure  5 --  Image  restored  as  in  Figure  2 but  with  lower 
cutoff  frequency  filters 


Figure  6 --  Blurred  image  taken  witn  a pocket  instamatic 
camera 


FIGURE  1 

Imaqe  restored  using  filters 
having  circular  symmetry 


FIGURE  2 

Image  restored  using  filt.  s with  elongated 
frequency  responses  oriented  perpendicular  to 
the  direction  of  the  blur 


FIGURE 


Sharp  image  used  to  create 
a prototype  log  spectrum 


FIGURE  4 

Image  restored  as  in  Figure  1 but  with 
lower  cutoff  frequency  filters 


FIGURE  5 

Imaqe  restored  as  in  Figure  2 but  with 
lower  cutoff  frequency  filters 


FIGURE  6 

Ulurred  image  taken  with  a 
pocket  instamatic  camera 
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SECTION  2 


COLOR  IMAGE  PROCESSING 
Oliver  Faugeras 

The  theoretical  background  of  the  work  of  Faugeras  was 
reported  in  the  last  Semi-annual.  During  this  six  months, 
the  Comtal  display  has  been  Integrated  into  our  available 
computing  facility,  and  experiments  are  continuing.  Color 
photographic  pictures  have  been  produced,  but  the  expected 
results  are  not  yet  available  in  a fully  optimal  manner.  We 
anticipate  that  this  work  will  be  ready  for  publication  in  a 
separate  Technical  Report  (TR)  during  the  late  summer  of 
1976.  We  expect  to  publish  this  TR  with  color  pictures  in  a 
limited  quantity,  but  we  do  not  expect  to  include  color 
pictures  in  the  Semi-annual  Report  series. 
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SECTION  3 

SPEECH  ENHANCEMENT  AND  CODING  IMPROVING 


SYNTHETIC  SPEECH  QUALITY  USING  BINAURAL  REVERBERATION 

Steven  F.  Boll 


i 

1 

i 

i 

I 


I 


I 


[ 
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The  degrading  characteristics  of  synthetic  speech  such 
as  minimum  phase  effects,  pitch  and  voicing  errors  and 
spectral  distortions  are  more  evident  when  the  speech  is 
listened  to  on  headphones  than  when  heard  in  a room  over  a 
loudspeaker.  Listening  to  monaural  sound  in  a room  over  a 
loudspeaker  differs  from  headphone  listening  in  two  major 
respects:  one,  a different  sound  source  is  presented  to 
each  (binaural  reproduction);  and  two,  the  sound  source  is 
altered  by  the  room's  acoustics,  (reverberation).  An 

experiment  was  conducted  to  Include  the  effects  of  binaural 
reverberation  on  synthetic  speech  heard  on  headphones.  To 
achieve  this  effect,  the  Impulse  response  of  a 20’  x 20' 
classroom  was  first  measured  by  applying  an  electrical  pulse 
a loudspeaker  and  recording  the  resulting 
room-loudspeaker  impulse  response  as  measured  by  two 

microphones  spaced  the  ears  distance  apart.  Figure  1 shows 
the  microphone  placements  within  a dummy  head.  Figures  2 
and  3 show  the  measured  left  and  right  impulse  responses  and 
Figures  H and  5 the  respective  frequency  responses.  These 
impulse  responses  were  then  convolved  with  the  speech  and 
played  through  each  headset  channel.  Results  demonstrate 


I . 
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that  this  process  not  only  suppresses  the 
characteristics  distortions  of  the  synthetic  speech,  but 
also  externalizes  the  sound  source  giving  the  effect  of 
non-headphone  listening.  An  example  showing  how  this 
process  reduces  the  min imum-pha^c  "buzzy"  quality  is  given 
in  Figures  6 through  8.  Figure  6 is  of  an  original  vowel. 
Figure  7 is  the  corresponding  synthetic  speech  with  its 
abnormally  high  peak  factor,  and  Figure  8 the 

post-processed,  convolved  response.  An  example  showing  how 
this  process  reduces  the  effect  of  a voicing  .rror  is  given 
in  Figures  9 through  11,  Figure  9 is  a segment  of  an 
original  frlcated  /sh /,  Figure  10  is  the  corresponding 
synthetic  speech  generated  with  the  incorrect  voicing 

decision,  and  Figure  11  the  post-processed  convolved 

response. 

Matching  and  Coding  of  Nonlinear  Spectral 
Estimates  by  Linear  Prediction 

Introduction . This  research  considered  applying  the 
spectral  matching  properties  of  linear  prediction  analysis 


to  nonlinear  spectral 

estimates.  Three 

areas 

are 

considered:  one. 

the 

matching  and  coding 

of  the 

log 

1 

magnitude  spectrum 

by  LPC 

(Cepstral  Prediction); 

two , 

1 

the  1 

estimation  of  spectral  zeros  as  well  as  poles  by  matching  to 
the  derivative  of  the  log  magnitude  spectrum,  (ramp 
modulated  cepstral  prediction);  and  three,  the  modeling  and 
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coding  of  the  spectrum  modified  by  middle  and  inner  ear 

n 0 n 1 i n ea  r i t i e 3 , (the  inner  spectrum)* 

Predictive  Spectral  Matching  . Linear 

Prediction  defines  a technique  for  matching  a given  power 

A 

spectrum  PCu)  by  an  all-pole  power  spectrum  PCw).  From  its 
set  of  autocorrelations  R(k).  a set  of  predictor 

coefficients,  a(k)  and  gain  factor  G are  computed  which 

minimize  the  loss  function: 

F s ^ P(w)  . 

^ 2t\  -tt  ~ 

^ ""  P(a)) 


where 


P(a.)  = 


k=0 


The  coefficients  a(k)  are  computed  by  solving  the  toeplitz 


system  of  equations: 


3(k)  R (i-k)  = -R(i)  l<i<p 


Page  17 


where 


R(l<) 


1 

2v 


P((d)  cos  (k(ij)da) 


with 


= R(0)  + I a(k)R(k) 
k=l 


Thus  the  analysis  procedure  consists  of  starting  with  a 
given  P (oj)  , inverse  transforming  to  a set  of  R(k),  and 
solving  for  a(k)  and  G 


Nonlinear — Spectral — Matching . The  same  analysis 

procedure  can  be  followed  where  now  a function  f(P(w)) 

replaces  P(a))  in  the  matching  process.  The  loss  function  to 
be  minimized  is  given  by 


XiPM) 

T(a)) 


T(w)  is  an  all-pole  power  spectrum  which 
minimize  E'.  it  is  estimated  using  the 
above,  namely  let 


is  computed  to 
same  procedure  as 


U(k) 


'TI 

-TT 


f(P(w))  COS  (ka))da) 
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and  solve 


q(k)u(k-i) 


-U( i ) l£i <p 


for  g(k)  giving 


T(a)) 


ji^ 

l q(k)?j''“ 

k=0 


2 


with 


= U(0)  + I q(k)U(k) 
k=l 


Three  areas  of  this  approach  of  nonlinear  spectral 
are  considered. 


modeling 
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.^pstral  Prediction.  By  matching  to  the  log^  P(u)  and 
LPC  estimate  of  the  real  cepstrum  is  obtained.  This  minimum 
phase  cepstral  estimate  is  uniquely  defined  by  either  its 
set  of  reflection  or  predictor  coefficients.  This  method 
allows  for  waveforms  to  be  analyzed  by  homomorphic 
techniques  but  coded  using  LPC  parameters;  and  thus  allows 
for  the  interfacing  of  a homomorphic  vocoder  analyzer  with  a 
transmission  channel  set  up  to  transmit  LPC  parameters. 

Modulated  Cepstral  Prediction.  A linear 

prediction  spectral  match  £log  P (w)  is  obtained  by 
applying  linear  prediction  analysis  to  the  ramp  modulated 
real  cepstrum.  The  spectral  peaks  of  the  resulting  filter 
are  then  matched  to  peaks  representing  both  poles  and  zeros 
of  the  prototype  spectrum.  For  prototype  spectra  where  zero 
approximation  is  important  (such  as  for  matching  narrow  stop 
bands  for  dynamically  changing  noise  suppression  filters), 
the  ramp  modulated  cepstrum  analysis  offers  an  efficient 
technique  for  zero  estimation. 

— Spectra  Modified  bv  Ear  Nonlinearities. 
There  is  evidence  that  the  spectrum  of  the  signal  applied  to 
the  external  ear  is  modified  by  nonlinear  distortion  in  the 
middle  and  inner  ear.  The  resulting  "inner  spectrum"  is  the 
one  converted  by  hair  cells  into  neural  discharges^ 
Inclusion  of  approximations  to  these  spectral  nonlinearities 
prior  to  linear  prediction  matching  is  continuing  to  be 
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considered  in  order  to  improve  speech  quality  of  a low 
bandwith  speech  analysis-synthesis  system. 


A-O-A  Diagnostic  Checkout  Procedures.  In  cooperation 
with  the  AHPA-IPT  Network  Speech  Compression  group,  Utah 
provided  a package  of  various  programs  and  essential 
peripheral  equipment  specifications  required  in  order  to 
test  and  maintain  A-D  and  D-A  Converters,  (NSC  Note  74). 

In  the  enclosed  package  were  copies  of  the  various 
programs  and  specifications  of  equipment  used  at  Utah  to 
maintain  our  15  bit  A-D  converters  and  our  16  bit  D-A 
converters.  This  package  is  intended  to  augment  the 
procedures  and  theory  developed  in  Chin  Moh  Tasi's  Master 
Thesis  entitled  "A  Digital  Technique  for  Testing  A-D  and  D-A 
Converters”.  Copies  of  Tsai's  thesis  were  mailed  to  the  NSC 
group  on  June  20,  1974.  Included  are  the  following: 

1.  Specifications  for  a low  noise/distortion 

Krohn-hite  oscillator:  Model  4024. 

2.  Specifications  for  an  external  distortion 

measurement  system:  Model  1700A  (Sound 

Technology ) . 

3.  Descriptions  of  two  programs  used  to  test  A-D  and 
D-A  converters:  ADTEST.SAV  and  AUDIO. SAV. 

4.  A copy  of  "A-D  and  D-A  Converter  Testing  and 

Maintenance/Photolog". 
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The  theory  supporting  the  checkout  procedures  is  presented 
in  Tsai's  thesis.  The  programs  ADTEST.SAV  and  AUDIO. SAV  are 
two  extended  versions  of  the  procedures  defined  in  the 
thesis.  fhey  both  make  essentially  the  same  measurements 
and  thus  only  extended  documentation  of  the  latter  is 
provided . 

Waveform  Matching  Using  Linear  Predictive  Coding. 
Essential  to  the  design  of  systems  for  speaker 
authentication,  variable  vocoder  frame  rate  transmission, 
word  spotting  in  continuous  speech,  and  isolated  word 
recognition,  is  the  reQuirement  for  comparing  a reference 
waveform  to  an  input  waveform.  A theoretical  development 
for  comparing  two  waveforms  using  linear  predictive  coding 
was  considered  during  the  period  1 January  1975  through  20 
June  1975.  The  results  of  this  investigation  were  in 

University  of  Utah  Computer  Science  Memorandum  No.  7500, 
May  1975  and  distributed  to  the  ARPA  Network  Speech 
Compression  group  as  NSC  Note  60. 

Based  upon  this  theoretical  development,  an  isolation 
word  recognition  system  using  the  linear  prediction  residual 
was  developed,  t2],  f3].  Results  demonstrating  the 

effectiveness  of  this  method  for  comparing  speech  waveform 
were  evidenced  by  a recognition  accuracy  of  96. 1H  when  the 
vocabulary  consisted  of  107  flight  commands  having  an 
average  of  two  syllables  per  word. 
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Optimal Time Heglstratlon Using  Dynamic 

o g r a mm  i ng  . tssentiai  to  any  procedure  used  to  compare  two 
waveforms  is  the  need  for  aligning  the  reference  pattern 
with  the  input  pattern.  For  this  isolated  word  recognition 
program,  a modified  Dynamic  Programming  procedure  was  jsed. 
This  technique  defined  a nonlinear  time  warping  function 
which  attempts  to  align  two  waveforms  of  different  lengths 
so  as  to  minimize  the  total  distance  between  them. 

The  theoretical  development  and  FORTRAN  procedures 
required  to  implement  the  Isolated  word  recognition  system 
and  Dynamic  Programming  time  warping  function  were  provided 
to  the  ARPA  Network  Speech  Compression  group  as  NSC  Note  73. 


Page  23 


RfclFERLNCES 


S.F.  Boll,  E.  {■erretti,  T.  Peterson,  "Improving 
Synthetic  Speech  Quality  Using  Binaural 

Reverberation",  Proc.  of  the  IEEE  Conf.  on 
Acoustics,  Speech  and  Signal  Processing,  Philadelria 
P.A.,  April  1976.  ’ 


M.J.  Coker,  "An  Isola ted-Word  Recognition  System 
Based  On  Linear  Predr.ction  Analysis,"  M.S.  Thesis, 
Elec.  Eng.  Dept.,  Univ.  of  Utah,  1975. 


M.  Coker  and  S.F,  Boll,  "An  Improved  Isolation  Wc.  o 
Recognition  System  Based  Upon  the  Linear  Prediction 
Residual",  Proc.  of  the  IEEE  Conf.  on  Acoustics, 
Speech  and  Signal  Processing.  Phildelphia,  P.A. 
April  1976. 


Page  24 


SECTION  4 


NOISE  SUPPRESSION  WITH  LINEAR  PREDICTION  FILTERING, 

Tracy  L.  Petersen 


The  preceding  semi-annual  report[2]  describes  the 
developmt.it  of  a dynamic  noise  suppression  filter  based  on 
linear  prediction  spectral  modeling[1],  where  the  formula 
for  a Wiener  filter  is  implemented  to  construct  successive 
filter  estimates  over  short  time  intervals.  These  estimates 
then  determine  the  time-varying  characteristics  of  a linear 
prediction  lattice  filter. 

Continued  work  with  this  noise  suppression  model  during 
this  half  year  has  focused  on  suppressing  noise  from  noisy 
speech  rather  than  singing  voice.  A series  of  experiments 
were  conducted  to  determine  the  relationship  between 
parameter  conditions  in  the  dynamic  Wiener  filter  model  and 
performance  of  the  model  In  effectively  suppressing  noise 
from  noisy  speech.  Main  results  were  three-fold,  The  time 
increment  between  successive  filter  estimates  was  reduced 
from  150  mi 1 liaeconds  to  12  milliseconds  which  prevented 
excessive  noise  from  appearing  during  rapid  transitions  in 
the  speech.  Maximum  attenuation  was  increased  from  -24dB  to 
'iSdB,  and  filter  k-parameters  were  reduced  from  90  to  64, 
Following  these  modifications,  tests  were  made  where  the 
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perceived  suppression  of  noise  was  Judged  by  experienced 
listeners  to  be  in  the  neighborhood  of  l8dB  while  the 
filtered  speech  remained  perfectly  intelligible.  Upper 
limits  on  the  level  of  noise  which  may  be  successfully 
suppressed  from  noisy  speech  have  not  yet  been  determined, 
but  results  indicate  that  even  higher  levels  of  noise  may  be 
suppressed  from  noisy  signals. 

impbovino  synthetic  speech  oualitt  by  modeling  the  lpc 
driving  function. 

When  an  LPC  synthesizer  is  excited  with  the  speech 
error  signal  (the  true  pitch  information)  the  speech  is 
reconstructed  as  the  original  (within  finite  word  length 
limits).  When  the  error  signal  is  coded  as  a pulse  train 
for  voiced  speech  or  as  white  noise  for  unvoiced  speech  the 
resulting  synthesized  speech  contains  an  undesireable 
distortion  usually  described  as  a "buzzy"  quality.  If  the 
error  signal  could  be  simulated  at  the  synthesizer  directly 
from  coded  pitch  information,  presumeably  synthetic  speech 
quality  would  be  greatly  improved  without  increasing  the 
bandwidth  of  the  channel  signals.  Recent  work  by  Petersen 
has  focused  on  this  problem  specifically.  it  has  been  found 
that  a strong  correspondence  exists  between  the  structure  of 
k-parameters  (filter  coefficients)  derived  from  analysis  on 
an  error  signal  and  pitch  information  coded  from  that  error 
signal,  allowing  for  the  possibility  of  extrapolating  a 


I 

« 

f Pase  26 

llmo-varyinR  3rt  of  filter  coefficients  from  a prototype  set 
based  on  the  input  pitch  information.  Some  initial  studies 
have  shown  that  such  a model,  when  driven  with  standard 
pitch  pulses,  produces  a waveform  with  characteristics 
similar  oo  the  error  signal.  Furtner  work  and  testing  'ith 


this  model  will  be  required  to  determine  its  degree  of 
usefulness  in  improving  synthetic  speech  quality. 
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SECTION  5 

SPEECH  PROCESSING  TO  REDUCE  NOISE 
AND  IMPROVE  INTELLIGIBILITY 
Michael  Wayne  Callahan 


A new  method  of  acoustic  signal  processing  has  been 
investigated  which  is  based  on  the  short-time  spectrum,  a 
two-dimensional  representation  that  shows  the  frequency 
content  of  the  signal  as  a function  of  time.  This 
representation  is  appropriate  for  signals  such  as  speech  and 
music,  where  the  natural  frequencies  of  the  source  change. 
In  addition,  this  representation  similar  to  frequency 
analysis  in  the  human  auditory  sysctE,  so  that  signal 
modifications  can  be  related  to  perceptual  criteria.  This 
method  has  been  applied  successfully  to  removal  of  broadband 
background  noise  ( signal-to-noise  ratio  about  30dB)  and  to 
reraoval  of  high  level  interfering  signals  with  strong 
harmonic  structure  (signal-to-noise  ratio  about  26dB).  Both 
of  these  experiments  were  described  in  previous  reports. 

Research  during  the  last  period  has  been  directed  at 
isolating  speech  features  in  the  short-time  spectrum  which 
are  known  to  be  important  to  perception,  and  to  applying 
these  results  in  a compression/expansion  system  for 
transmitting  speech  through  a noisy  channel. 
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Isolation  Of  Perceptually  Important  Speech  Features: 

Experiments  were  conducted  to  attempt  to  isolate 
selected  speech  features  by  bandpass  filtering  the  logarithm 
of  the  short-time  spectrum.  (The  logarithm  was  taken  to 
model  approximate  logarithmic  sensitivity  of  the  ear.)  Three 
speech  features  were  selected  as  typical;  pitch,  formants, 
and  plosive  noise  bursts. 

The  results  of  these  experiments  are  shown  in  Figures 
1.b,  1.C,  and  l.d,  together  with  the  original  speech  in 

Figure  l.a.  The  pictures  have  been  scaled  to  use  the  full 
range  of  the  film,  so  the  figures  do  not  show  amplitude 
relative  to  the  original  speech.  The  pitch,  formants,  and 
plosive  bursts  have  much  lower  dynamic  range  than  the 
original,  host  of  the  dynamic  range  of  the  original  is  in 
the  slowiy  changing  component  of  the  short-time  spectrum. 
This  is  illustrated  by  Figure  I.e,  which  shows  features 
obtained  by  filtering  the  logarithm  of  the  short-time 
spectrum  in  two  dimensions  to  suppress  the  slowly  changing 
component.  The  fact  that  the  perceptually  important 
features  are  still  apparent  in  Figure  I.e  suggests  that 
speech  processed  in  this  manner  should  still  be  highly 
intelligible,  and  this  is  in  fact  the  case.  Such  speech 
might  therefore  be  more  intelligible  than  normal  speech  in  a 
noisy  environment.  Informal  listening  teats  and  the  results 
discussed  below  support  this  notion. 


I 


FIGURE  I 

Speech  features  obtained  by  t ) dimensional  filtering 
of  the  logarithm  of  the  short-time  spectrum;  (a)  original 
speech,  (b)  pitch,  (c)  formants,  (d)  plosive  noise 
bursts,  (e)  slowly  changing  component  removed.  The 
speech  is  "the  pipe  began  to  rust". 


(continued) 
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FIGURE  1 


Speech  f atures  obtained  by  two  dimensional  filtering 
of  the  logarithm  of  the  short-tiu.e  spectrum;  (a)  original 
speech,  (b)  pitch,  (c)  formants,  (d)  plosive  noise 
bursts,  (e)  slowly  changing  component  removed.  The 
speech  is  "the  pipe  began  to  rust". 


Two-Dimensional  Compression  And  Expansion  Of  Speech: 


3?. 


Oppenheim  et  ai  [1]  investigated  a homomorphic  system 
for  compression  and  expansion  of  acoustic  signals.  The 
system  models  audio  signals  as  a product  of  two  components  - 
a slowly  varying,  positive  envelope  anu  a rapidly  varying 
bipolar  signal.  These  multiplied  signals  can  be  mapped  into 
added  signals  by  the  logarithm  and  the  envelope  compressed 
or  expanded  by  llnea:*  filtering.  A two-dimensional 
( f r equenc y/ time ) system  for  compression  and  expansion  can  be 
obtained  in  a similar  manner  by  modtling  the  short-time 
spectrum  as  a product  of  two  components  - a slowly  varying 
envelope  which  is  of  lesser  importance,  and  a rapidly 
varying  component  which  contains  most  perceptually  Important 
features.  Compressing  the  signal  by  attenuating  the  large, 
slowly  changing  component  should  greatly  reduce  the  dynamic 
range  of  the  signal  while  preserving  the  information 
content . 

A compression/expansion  system  based  on  this  concept  is 
shown  in  Figure  2,  "STFT"  represents  the  short-time  Fourier 
transform,  and  "I”  represents  reconstruction  of  a time 
signal.  The  block  "T"  represents  the  effect  of  channel 
noise:  f.g.,  tape  hiss  or  quantization  noise. 

The  two-dimensiona system  was  simulated  for  both  an 
analog  and  digital  channel.  In  both  cases  the  system 


Page  34 

provided  considerable  improvement  over  a similar  homomorphic 
system.  In  the  analog  case,  noise  was  first  audible  in  the 
output  of  the  two-dimensional  system  at  a channel 
signal-to-noise  ratio  of  12dB,  compared  to  30dB  for  the 
one-dimensional  homomorphic  system.  In  the  digital  channel 
simulation,  the  two-dimensional  system  provided  a three-bit 
improvement  over  the  homomorphic  system.  Noise  was  first 
audible  in  the  output  of  the  two-dimensional  system  with 

four  bit  channel  quantization,  compared  to  seven  bit 
quantization  for  the  homomorphic  system.  As  a reference, 
noise  is  audible  in  the  uncompressed  signal  with  9-10  bit 
quantization  . 

Although  distortions  are  audible,  the  system  still 
produces  natural,  intelligible  speech  with  three  bit  channel 
quantization . 
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SECTION  6 

LINEAR  PREDICTIVE  CODING  WITH  A GLOTTAL  WAVEFORM  MODEL 

William  J.  Done 

Although  synthetic  speech  generated  with  all-pole  LPC 
techniques  is  intelligible  and  natural  sounding,  it  suffers 
from  a raspy  or  coarse  sound.  This  flaw  is  especially 
annoying  when  the  listener  is  exposed  to  long  segments  of 
synthetic  speech,  A possible  source  of  this  degradation  is 
the  failure  of  all-pole  linear  predictive  models  to 
accurately  match  zeros  in  the  speech  spectra  or  the  failure 
to  duplicate  (using  impulse  excitation)  the  excitation 
characteristics  of  the  actual  glottal  pulse  wave. 

The  linear  prediction  technique  is  based  on  the 
approximation  of  the  n^^  speech  sample  s(n)  as  a summation 
of  the  N previous,  linearly  weighted  samples  to  form  s(n). 
An  all-pole  inverse  filter  is  obtained  by  approximating  the 
spectrum  of  the  error  as  a constant.  By  relating  the  energy 
in  the  error  sequence  to  the  integral  of  the  ratio  of  the 
speech  power  spectrum  to  the  estimate's  power  spectrum,  it 
is  evident  that  the  all-pole  approximation  will  result  in  a 


good  fit 

to  the 

poles  in 

the 

spectrum . 

By 

this  same 

process , 

it  is 

apparent 

that 

spsctral 

zeros 

will  not  be 

matched  as  well. 
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^ To  better  model  the  zeros,  it  is  assumed  that  the  error 

signal 

e(r')  = s(n)  - s(n) 

M 

= s(n)  - I a(i)s(n-i) 
i = l 


could  be  modelled  as  the  effect  of  the  glottal  pulse  op  the 
zeros  of  the  vocal  tract.  That  is 

N 

s(n)  = I a(i  )s(n-i ) + e(n) 
i = l 

N M 

= l a(i  )s(n-i ) + I b{j )g(n-j  ) , 

1-1  j=o 


where  the  b(J)  are  the  zero  coefficients  and  g(n)  is  an 
assumed  glottal  waveform  model.  Note  that  the  zero 
coefficients  are  not  excited  during  times  when  g(n)  = 0. 
This  corresponds  to  a closed  glottis  condition.  The 
analysis-synthesis  procedure  based  on  this  model  for  voiced 
speech  is  summarized  as  follows* 

1.  The  pole  coefficients,  a(i),  are  determined  by 


! 
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linear  predictive  analysis  using  the  covariance 
technique . 

2.  e(n)  is  generated  from  the  original  speech  and  the 
pole  coefficients. 

3.  Using  the  least  squares  method,  the  zero 
coefficients,  b(j),  are  generated  from  the  assumed 
glottal  waveform  g(n)  and  the  error  signal  e(n). 

. Synthesis  is  performed  by  using  e(n)  as  the 
excitation  , 

N 

s{n)  )s{n-i ) + e(n) 


where  e(n)  is  generated  from  the  glottal  waveform 
model  and  the  zero  coefficients  calculated  in  step 
3: 


M 

e(n)  * I b(j)g(n-j) 


For  unvoiced  speech  the  all-pole  model  excited  by  noise  is 
used  [ 1 ] . 

The  previous  discussion  proposes  a glottal  waveform 
model  to  develop  the  zero  coefficients.  The  models  used  are 


based  on  results  by  Rosenberg  [2]  and  Holmes  [3]. 
Construction  of  the  glottal  wave  for  each  segment  of  voiced 
speech  is  bpsed  on  the  pitch  value  for  that  segment.  The 
waveform  is  generated  with  a fixed  amplitude  and  pitch 
dependent  opening  and  closing  times.  Figure  1 illustrates 

the  two  models  used  to  date.  In  the  figure,  Tp  is  the  pitch 
period,  Tq  the  opening  time,  and  the  closing  time.  The 

smooth  model  of  Figure  1a)  is  constructed  of  polynomic 
segments.  This  was  the  first  model  tried,  based  on 
naturalness  tests  reported  in  [2].  When  the  smooth  model 


failed  to  reproduce 

the 

rapid 

transitions  in 

the  error 

signal,  the 

triangular 

pulse 

of  Figure  1b)  was 

developed  . 

It  achieved 

better 

results 

in  duplicating 

the  sharp 

transitions  of  the  error  signal. 
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In  analyzing  the  error  signal,  it  is  apparent  that  the 
peak-to-peak  excursions  of  the  error  waveform  are  maximum  in 
the  neighborhood  of  an  open  glottis.  For  these  segments, 
the  glottal  pulse  is  forcing  the  vocal  tract,  and  the  speech 
waveform  begins  to  deviate  from  the  response  due  to  vocal 
tract  characteristics.  The  sudden  closure  of  the  glottis 
corresponds  approximately  to  the  large  excursions  in  the 
next  pitch  period.  Because  of  the  changing  waveform 
characteristics  strongly  evident  in  these  areas,  the  error 
signal  resulting  from  a linear  predictive  model  grows  in 
magnitude  during  these  intervals.  In  order  to  approxi*-\te 
these  segments  more  closely  than  intervals  when  the  glottis 
is  closed,  the  error  signal  is  zeroed  for  the  intervals 
approximately  corresponding  to  a closed  glottis,  as 
determined  from  the  original  speech  waveform.  Figure  2 
Illustrates  this  process. 

As  mentioned,  the  triangular  glottal  pulse  model  was 
needed  to  better  approximate  the  sharp  transitions  of  the 
error  signal,  M,  the  order  of  the  zero  model,  and  the 
opening  and  closing  times  of  the  glottal  pulse  influence  the 
match  of  the  approximate  error  signal  to  e(n),  the  original 
error  waveform.  For  example,  glottal  parameter  settings  of 
Tq  = 0,24Tp  and  Tq  r 0,04Tp,  and  an  order  of  M c 6 (7  zero 
coefficients)  produce  a good  representation  of  the  error 
signal  when  the  triangular  model  is  used.  Figure  2d)  gives 
the  approximate  error  signal  resulting  from  this  modelling 
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for  the  vowel  / 30  / . 

In  approximating  the  error  signal  as 
M ^ 

e(n)  = I b(j)g(n-j)  , 
j=o 


it  was  stated  that  the  zeros  will  have  no  direct  effect  when 
g(n)  = 0.  Since  the  glottal  excitation  is  also  often  zero 
in  actual  speech  production,  one  of  the  goals  of  this 
research  was  to  determine  whether  better  estimates  of  the 
pole  coefficients  could  be  obtained  by  analyzing  voiced 
speech  only  when  the  glottis  was  clojed,  the  speech  waveform 
primarily  representing  the  vocal  tract  during  that  state.  A 
decrease  in  computational  effort  for  calculating  the 
covariance  matrix  would  also  be  achieved.  Prior  to  loading 
the  covariance  matrix  to  calculate  the  a(i)  coefficients, 
weighted  least  squares  was  used  to  zero  the  sections  of  a 
pitch  period  corresponding  to  the  glottis-open  state. 

Speech  occurring  during  the  glottis-doted  time  was  weighted 

by  one.  The  glottis-open  segment  of  a pitch  period  was  set 

as  a fixed  percentage  of  the  pitch  period.  Closure  of  the 

glottis  was  set  at  the  maximum  absolute  excursion  of  the 

waveform  in  the  next  pitch  period.  Results  indicated  that 

while  use  of  only  50%  of  the  data  in  a pitch  period  might  I 

produce  a satisfactory  synthesis  for  one  phoneme,  the  | I 
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; 
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Figure 

a) 

b) 

c) 

d) 


: Example  waveforms  for  the  vowel  /ae  / 

Original  speech  waveform. 

The  error  signal  e{n). 

e(p)  windowed  to  retain  region  near  open  glottis. 
Approximate  error  signal  generated  from  7 zero 
coefficients. 
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synthesis  for  another  phoneme  might  be  completely 
unacceptable.  For  this  reason  all  of  the  speech  data  is 
used  in  computing  the  covariance  matrix. 

Synthetic  speech  generated  using  the  glottal  waveform 
modelling  technique  described  here  is  intelligible. 
However,  the  speech  has  a muffled  sound  and  la-'ka  the 
"sharpness"  of  the  original  or  all-pole  LPC  synthesis.  This 
niuffled  quality  results  from  a lack  of  high  frequency  energy 
in  the  excitation  signal  generated  from  the  zero 
coefficients  and  glottal  waveform  model.  Presently,  work  is 
being  done  on  improving  the  synthesis  by  eliminating  the 
muffled  effect.  This  seems  to  require  modifications  in  the 
model  — especially  the  derivation  of  an  excitation  signal 
for  the  all-pole  portion  of  the  synthesizer. 
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SECTION  7 

PERCEPTUALLY  INVARIANT  TRANSFORM  ANALYSIS 
James  T.  Kajiya 

The  research  proposed  here  has  as  its  roots  the  work  of 
Stockham[1].  There  the  problem  of  processing  images  was 
approached  with  a perceptual  model  in  hand.  This  approach 
differs  radically  from  conventional  methods  in  that  it  does 
not  use  as  its  principal  model  one  that  describes  the  method 
of  image  production,  Other  methods  of  sensory  information 
processing  such  as  Linear  Predictive  Coding,  Blind 
Deconvolution,  etc.  attempt  to  find  parameters  that 
describe  the  functioning  of  the  production  mechanism.  In 

[ 1 ] Stockham  deals  with  the  mechanism  that  consumes  the 
sensory  information. 

Image  understanding  and  speech  understanding  may  be 
approached  in  this  way.  The  model  for  the  perceptual 
mechanism  howsver  must  be  chosen  in  a new  way.  Fortunately 
some  work  has  already  occured  toward  this  goal. 

In  1947  Pitts  and  McCulloch[23  recognized  that  images 
and  sounds  can  be  represented  as  real-valued  functions  on 
what  they  termed  the  visual  and  aural  manifold.  Along  with 
various  physiological  speculations  they  recognized  that 
there  exist  transformations  on  the  manifold  that  are  useful 
in  analyzing  the  process  of  perception.  This  set  of 
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transformations  form  a Group  acting  on  what  can  be 
considered  a homogeneous  space.  Pitts  and  McCulloch's 
fundamental  observation  was  that  the  group  of 
transformations  must  leave  the  so  called  preceptual 
constancies  Invariant. 

k . C . hof fman[ 3 , 5]  developed  the  group  of  visual 
transformations  and  recognized  that  It  was  endowed  with  a 
Lie  Group  structure.  Basing  his  work  on  the  experiments  of 
Hubei  and  Wiesel  he  was  eventually  able  to  predict  some 
optical  illusions  of  angle. 

Analysis  of  functions  defined  on  a homogeneous  space 
also  has  a rich  tradition  in  modern  mathematics  and  physics. 

Powerful  methods  in  Quantum  Mechanics  used  to  solve  the 
Schrodlnger  eigenvalue  problem  capitalize  upon  symmetries  in 
the  Hamiltonian  (see  Weyl[6]).  These  symmetries  are 
expressed  as  invariances  under  certain  transformations  of 
spacetime , viz . typically  the  group  of  three-dimensional 
rotations  about  the  nucleus  (0(3))  or  the  Lorentz  group. 
Using  the  theory  of  Group  Representations  these  methods 
decompose  the  space  of  complex-valued  functions  defined  on 
the  spacetime  manifold  into  invariant  irreducible  subspaces. 
These  invariant  irreducible  subspaces  for  the  group  0(3)  are 
finite  dimensional  and  are  spanned  by  the  so  called 
Spherical  Harmonics.  The  situation  for  the  Lorentz  group 
(see  WignerC?])  ia  not  quite  so  felicitous.  Its 
corr^sponding  invariant  irreducible  subspaces  need  not  be 
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finite  dimensional.  This  fact  explains  in  part  a major 
obstacle  to  a satisfactory  theory  of  Relativistic  Quantum 
Mechanics.  To  gain  an  insight  into  the  difference  between 
the  two  cases  cited  above  we  may  ask  why  they  are  different. 
The  crucial  property  we  seek  is  that  of  the  topological 
notion  of  compactness.  The  group  0(3)  is  compact  while  the 
Lorentz  group  is  noncompact.  It  is  known  that  all  Lie 
Groups  enjoy  the  relaxed  condition  of  local  compactness. 

In  the  early  1950's  G . W . Mackey [ 8 , 9 , 1 0 , 1 1 ] began  to 
successfully  address  the  Issues  of  infinite  dimensional 
locally  compact  group  representations  by  unitary  operators 
on  a hilbert  apace.  His  work  made  a significant 
contribution  to  the  progress  of  the  theory,  Particularly 
fundamental  was, his  refinement  and  extension  of  ^he  concept 
of  induced  representations  first  used  by  Frobenius. 

An  analysis  of  some  of  the  more  basic  perceptual  Lie 
Transformation  Groups  shows  th.'t  they  satisfy  the  algebraic 
property  of  Solvability,  The  analysis  of  unitary 
representations  of  solvable  Lie  Groups  quite  recently 
initiated  by  Kirillov,  Diximier,  Auslander,  Moore, 
et  .al . [ 12, 13  , iij , 15 , 16 , 17]  can  thus  be  inferred  to  have  a 
major  impact  on  the  theory  of  perception.  In  fact,  it  is 
easy  to  see  that  the  situation  parallels  closely  that  of 
Quantum  Mechanics.  Indeed,  the  wave  function  is  a 
complex-valued  function  on  the  spacetime  manifold;  similarly 
images  and  sounds  are  real-valued  functions  on  the  visual 
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and  aural  manifold.  Instead  of  analysis  of  a prooability 
wave  into  invariant  components  we  see  analysis  of  images  or 
sounds  into  components  invariant  under  various  perceptual 
transformat 'ons. 

Thus  by  using  the  theory  of  Group  Representations  we 
hope  to  obtain  a number  of  transforms  whose  action  expresses 
closely  the  invariance  that  psychophysiologists  are  wont  to 
call  perceptual  constancy.  Perhaps  even  more  signi f ici-.nt  is 
the  development  of  a method  that  generates  perceptually 
important  transforms  given  models  of  a perceptual  process 
couched  in  the  language  of  Lie  Groups.  It  is  in  this  way 
that  we  hope  to  further  the  cause  of  information  processing 
in  the  contex  of  a psychophysiological  model. 
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SECTION  8 

Image  Understanding 
Martin  Newell 


Work  in  this  new  area  was  conceived  and  proposed  during 
the  early  part  of  this  period.  Detailed  project  planning  is 
currently  underway. 

The  analysis  of  imaging  is  undertaken  for  the  purposes 
of  automatic  recognition  of  previously  known  objects,  or  for 
synthesizing  models  of  previously  unknown  objects.  This 
research  is  based  on  the  hypothesij  that  such  analysis  can 
benefit  greatly  if  carried  out  in  conjunction  with 
three-dimensional  models  of  the  objects  in  the  scene. 

Given  modeling  and  image  synthesis  facl.’ltles  of 
sufficiently  advanced  capability,  analysis  of  such  scenes 
can  be  carried  out  using  an  analysis  by  synthesis  approach. 
The  analysis  cycle  starts  with  some  hypothesis  about  objects 
in  the  actual  scene,  and  their  orientation  with  respect  to 
the  camera.  A synthetic  image  is  then  created  with  the 
modeling  facility  and  compared  with  the  actual  image.  The 
model  of  the  objects  in  the  synthetic  image  is  then  modified 
based  on  differences  between  these  two  images,  and  the  cycle 


repeated . 
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Four  main  problem  areas  will  be  attacked  in  order  to 
develop  such  a system. 

1.  Abstraction  of  perceptually  relevant  Information 
from  images  for  use  in  guiding  the  comparisons 
between  the  real  and  synthesized  images. 

2.  Techniques  for  generalized  correlation  in  both  two 
and  three  dimensions  for  the  purposes  of  finding 
the  best  fit  between  the  real  image  and  a synthetic 
image  . 

3.  Synthesis  of  high  fidelity  images  capable  of 

reproducing  the  perceptually  Important 

characteristics  of  real  images. 

4.  Development  of  modeling  system  of  sufficiently 
advanced  capability  for  storing  and  manipulating  a 
wide  variety  of  object  representations. 
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